Strong Platonic Representation Hypothesis

2025-05-25

Definition

Neural networks trained with the same objective and modality, but with different data and model architectures, converge to a universal latent space such that a translation between their respective representations can learned without any pairwise correspondence.

Relationship to the Platonic Representation Hypothesis

The Platonic Representation Hypothesis conjectures that all image models of sufficient size have the same latent representation. We propose a stronger, constructive version of this hypothesis for text models: the universal latent structure of text representations can be learned and, furthermore, harnessed to translate representations from one space to another without any paired data or encoders. In this work, we show that the Strong Platonic Representation Hypothesis holds in practice. Given unpaired examples of embeddings from two models with different architectures and training data, our method learns a latent representation in which the embeddings are almost identical.¹

Footnotes

Harnessing the Universal Geometry of Embeddings ↩